perm filename PUBS[PUB,MUS] blob
sn#490510 filedate 1980-01-03 generic text, type C, neo UTF8
COMMENT ā VALID 00009 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 INTERNAL PUBLICATIONS
C00008 00003 STAN-M-2 February, 1975 $7.10
C00015 00004 STAN-M-3 May, 1975 $8.60
C00022 00005 In general, the system works tolerably well on the restricted class of musical
C00027 00006 REPRINT August, 1977 $3.00
C00029 00007 BIBLIOGRAPHY OF NATIONAL PUBLICATIONS
C00034 00008 James A. Moorer, "The Use of Linear Prediction of Speech in Computer Music
C00036 00009
C00038 ENDMK
Cā;
INTERNAL PUBLICATIONS
Center for Computer Research in Music and Acoustics
Artificial Intelligence Laboratory
Stanford University
Stanford, California 94305
The Stanford computer music group produces technical memoranda, describing
results of the research done at Stanford. We can offer these memoranda to the
public, but we request that we be reimbursed for publication costs by a donation
of the amount listed by each memo. This donation goes exclusively into the
publication funds for the project and helps us bring this work to the public.
Make checks payable to Stanford University. The donation is tax-deductable.
Some reprints of national publications are available from Stanford at the noted
suggested prices.
STAN-M-1 July, 1974 $5.65
"Computer Simulation of Music Instrument Tones in Reverberant
Environments"
by John M. Chowning, John M. Grey, Loren Rush,
and James A. Moorer
This is a reprint of selected portions of the NSF proposal which resulted in a
grant to the computer music group for research over a two-year period. The
following is the abstract from the memo:
Novel and powerful computer simulation techniques have been developed which
produce realistic music instrument tones that can be dynamically moved to
arbitrary positions within a simulated reverberant space of arbitrary size by
means of computer control of four loudspeakers. Research support for the
simulation of complex auditory signals and environments will allow the further
development and application of computer techniques for digital signal
processing, graphics, and computer based subjective scaling, toward the
analysis, data reduction, and synthesis of music instrument tones and
reverberant spaces. Main areas of inquiry are: 1) those physical
characteristics of a tone which have perceptual significance, 2) the simplest
data base for perceptual representation of a tone, 3) the effect of
reverberation and location on the perception of a tone, and 4) optimum
artificial reverberation techniques and position and number of loudspeakers for
producing a full illusion of azimuth, distance, and altitude. These areas have
been scantily investigated, if at all, and they bear on a larger more profound
problem of intense cross-disciplinary interest: the cognitive processing and
organization of auditory stimuli. The advanced state of computer technology now
makes possible the realization of a small computer system for the purpose of
real-time simulation. The proposed research includes the specification of, and
program development for, a small special purpose computing system for real-time,
interactive acoustical signal processing. The research in simulation and system
development has significant applications in a variety of areas including
psychology, education, architectural acoustics, audio engineering, and music.
STAN-M-2 February, 1975 $7.10
"An Exploration of Musical Timbre"
by John M. Grey
This is a reprint of John Grey's doctoral dissertation, submitted to the
department of Psychology, Stanford University.
Due to its overwhelming complexity, timbre perception is a poorly understood
subject in the field of auditory perception. Computer-based research tools have
been developed that appear to be important for an investigation of timbre
perception. In the work to be described, an exploratory approach was formulated
for dealing with this highly multidimensional attribute of sound. This approach
utilized a computer technique for the synthesis of musical timbres based on the
analysis of natural instrument tones. This technique was useful for generating
stimuli in timbre experiments because of its to effectiveness in allowing the
investigator to specify and manipulate the physical properties of complex
time-variant tones. An important discovery resulted suggesting that naturalistic
tones can be synthesized from a vastly simplified set of physical properties.
These simplified tones were useful as stimuli in further studies on timbre
perception because of the great reduction in the number of physical factors to
be considered in making psychophysical interpretations of perceptual data.
Another study undertook to equalize a set of tones in the dimensions of pitch,
loudness and duration, in order to eliminate confounding factors from future
judgments on different timbres. The simplified and matched tones were rated by
pairwise similarity in a further study, and the results were treated with
computer-based multidimensional scaling techniques to obtain an interpretable
data structure in a low dimensionality. Three dimensions were found to explain
the similarity data. Two were related to obvious physical properties of the
tones (to the gross characteristics of the spectral energy distribution; and to
the existence of precedent low-amplitude, high-frequency, and possibly
inharmonic energy in the initial segment of the attack). The third dimension was
interpretable either in terms of a physical property (synchronicity in the
attacks of higher harmonics) or as a higher-level distinction made between the
tones on the basis of their musical instrument family. Another set of studies
next initiated an exploration of timbre in terms of continuous versus
categorical perception. An algorithm was designed to generate a set of tones
interpolating between two naturalistic timbres. Identification, discrimination
and perceptual similarity studies were performed using a set of stimuli
generated by interpolations. The results of these studies suggested that
interpolations were perceived to be continuous rather than categorical.
Furthermore, the timbral similarities between a partial set of the naturalistic
and interpolated tones revealed three perceptual dimensions that related
directly to those found above for the total set of naturalistic stimuli. The
first two physically-related dimensions were found, and the third dimension
seemed to correspond to a higher-order distinction made between naturalistic
tones and the interpolation-derived tones, this superseding the family
distinction made for the total set of naturalistic tones. A notion of timbre is
developed involving both a higher-level perceptual processing of tones that has
access to stored information relating to the distinctive features of
identifiable sources, and a lower-level, qualitative perceptual comparison of
tones with respect to gross acoustical features lying outside of the domain of
specific identification. Suggestions for future research are made.
STAN-M-3 May, 1975 $8.60
"On the Segmentation and Analysis of Continuous Musical Sound by
Digital Computer"
by James A. Moorer
This is a reprint of James Moorer's doctoral dissertation, submitted to the
department of Computer Science, Stanford University.
The problem addressed by this dissertation is that of the transcription of
musical sound by computer. A piece of polyphonic musical sound is digitized and
stored in the computer. A completely automatic procedure then takes the
digitized waveform and produces a written manuscript which describes in
classical musical notation what notes were played. We do not attempt to identify
the instruments involved. The program does not need to know what instruments
were playing.
It would appear that it is quite difficult to achieve human performance in
taking musical dictation. To simplify the task, certain restrictions have been
placed on the problem: (1) The pieces must have no more than two independent
voices. (2) Vibrato and glissando must not be present. (3) Notes must be no
shorter than 80 milliseconds. (4) The fundamental frequency of a note must not
coincide with a harmonic of a simultaneously sounding note of a different
frequency. The first three conditions are not inherent limitations in the
procedures, but were done simply for convenience. The last condition would seem
to require more study to determine the cues that human listeners use to
distinguish, for example, notes at unison or octaves. Numerous other lesser
restrictions were also imposed on the music to be analysed.
The method used for this analysis is a directed bank of sharp-cutoff bandpass
filters. First, a pitch detector is used to determine the harmony of the piece
at each point in time. Using the harmony information, the frequencies of a band
of bandpass filters is determined so as to assure that every harmonic of every
instrument will pass through at least one of the filters. The output of each
filter is processed by a pitch detector and an energy detector. This gives power
and frequency information as functions of time. Each power and frequency
function pair is rated as to its quality. The rating takes into account the
constancy of the frequency function, the smoothness of the power function, and
several other measurements on the functions. This rating is used to eliminate
spurious traces and null filter outputs.
Notes are then inferred from groups of power and frequency function pairs that
occur simultaneously with frequencies that are harmonically related. Notes with
higher overall ratings are preferred over other note hypotheses. The melodies
are then grouped by separating the notes into the higher voice and the lower
voice. Voice crossings are not tracked. For the final manuscripting, Professor
Leland Smith's MSS program was used. The analysis program produces directly
input for the manuscripting program, thus the entire procedure is automated.
In addition to the above described system, many other techniques were examined
for their utility in this task. Each technique that was explored is described
and analysed, with a description of why it was not found useful for this task.
One interesting observation is that there is considerably more activity in a
piece of music than is perceived by the listner. This is especially common with
stringed instruments, because the strings that are not being manipulated
invariably resonate and produce sounds independently which are generally not
heard due to aural masking. This indicates that perhaps we should use more
perceptually-based techniques to help determine what would actually be heard in
a piece of music, rather than determine exactly what is there, although detailed
descriptions of the contents of the piece may be useful for other purposes, such
as music education or musicology.
In general, the system works tolerably well on the restricted class of musical
sound. Examples are shown which demonstrate the viability of the system for
different instruments and musical styles. Since the procedure is extremely
costly in terms of computer time, only a limited number of examples could be
processed. These examples are discussed with a description of how the system
could be improved and how the restrictions might be eliminated by better
processing techniques.
STAN-M-4 February, 1975 $1.80
"On the Loudness of Complex, Time-Variant Tones"
by James A. Moorer
This memo is part of a proposal to the NSF division of Psychobiology.
This study of loudness is motivated by the discovery that a set of complex,
time-variant tones appear to behave differently with respect to loudness than
would be predicted by the methods proposed in the literature. It is possible
that the time-variant behavior of the sounds influences the loudness, so that a
more complete theory of loudness must take this behavior into account. We thus
propose to study these data and attempt to either verify the existing theories
of loudness or formulate a more comprehensive hypothesis of loudness, building
upon the currently existing theories, and to test this hypothesis by
synthesizing new tones, doing equalization experiments, and comparing the
results with the predictions of the model of loudness perception.
STAN-M-5 December, 1975 $3.00
"The Synthesis of Complex Audio Spectra by Means of Discrete
Summation Formulae"
by James A. Moorer
A new family of economical and versatile synthesis techniques have been
discovered which provide a means of controlling the spectra of audio
signals that has capabilities and control similar to those of Chowning's
frequency modulation technique. The advantages of the current methods
over frequency modulation synthesis are that the signal can be exactly
limited to a specified number of partials, and that "one-sided" spectra
can be conveniently synthesized.
NOTE: This document is no longer printed, because it is superceeded by the
Audio Engineering Society Journal article:
James A. Moorer, "The Synthesis of Complex Audio Spectra by Means of
Discrete Summation Formulae", Journal of the Audio Engineering Society,
Volume 24, #9, November 1976, pp 717-727
Reprints of this article can be ordered directly from the Audio
Engineering Society, 60 East 42nd Street, New York, N.Y. 10017
REPRINT August, 1977 $3.00
"Signal Processing Aspects of Computer Music: A Survey"
Invited Paper for the Proceedings of the IEEE, Volume 65, Number 8,
pp1108-1137
by James A. Moorer
The application of modern digital signal processing techniques to the
production and processing of musical sound gives the composer and musician
a level of freedom and precision of control never before obtainable. This
paper surveys the use of analysis of natural sounds for synthesis, the use
of speech and vocoder techniques, methods of artificial reverberation, the
use of discrete summation formulae for highly efficient synthesis, the
concept of the all-digital recording studio, and discusses the role of
special-purpose hardware in digital music synthesis, illustrated with two
unique digital music synthesizers.
BIBLIOGRAPHY OF NATIONAL PUBLICATIONS
NOTE: Reprints of some of these are available from Stanford at costs
listed below.
John M. Chowning, "The Simulation of Moving Sound Sources", Journal of the
Audio Engineering Society, Volume 19, #1, 1971
John M. Chowning, "The Synthesis of Complex Audio Spectra by Means of
Frequency Modulation", Journal of the Audio Engineering Society, Volume
21, # 7, September 1973, pages 526-534
James A. Moorer, "The Optimum Comb Method of Pitch Period Analysis of
Continuous Digitized Speech", IEEE Trans. on Acoustics, Speech, and Signal
Processing, Vol. ASSP-22, #5, October 1974, pp330-338
James A. Moorer, "On the Transcription of Musical Sound by Digital
Computer". Presented at the Second USA-JAPAN Computer Conference, August,
1975, reprinted in the Computer Music Journal, Volume 1, #4, November
1977, pp32-38
James A. Moorer, "The Synthesis of Complex Audio Spectra
by Means of Discrete Summation Formulae", Journal of the Audio Engineering
Society, Volume 24, #9, November 1976, pp 717-727 (this superceeds memo
STAN-M-5)
John M. Grey, "Multidimensional Perceptual Scaling of Musical Timbres",
Journal of the Acoustical Society of America, Volume 61, #5, May 1977,
pp1270-1277
John M. Grey, James A. Moorer, "A Perceptual Evaluation of Synthetic Music
Instrument Tones", Journal of the Acoustical Society of America,
Volume 62, pp454-462, August 1977
James A. Moorer, "Signal Processing Aspects of Computer Music - A Survey".
Invited Paper, Proceedings of the IEEE, Volume 65, #8, August, 1977,
pp1108-1137. Reprinted in Computer Music Journal Vol. 1, 1, 1977.
(Reprint available from Stanford at $3.00)
James A. Moorer, "The Use of the Phase Vocoder in Computer Music
Applications". Journal of the Audio Engineering Society, 1978
John M. Grey, John W. Gordon, "Perceptual Effects of Spectral
Modifications on Musical Timbres", Journal of the Acoustical Society of
America, Volume 63, #5, May 1978, pp1493-1500
John W. Gordon, John M. Grey, "Perception of Spectral Modifications on
Orcestral Instrument Tones," Computer Music Journal, Volume 2, #1, July
1978, pp24-31
James A. Moorer, "How Does a Computer Make Music?". Computer Music
Journal, Volume 2, Number 1, July 1978, pp32-37
John M. Grey, "Timbre Discrimination in Musical Patterns," Journal of the
Acoustical Society of America, Volume 64, #2, August 1978, pp467-472
James A. Moorer, "On the Coding of High-Quality Digitized Sound".
Presented at the 1979 European Conference of the Audio Engineering
Society, Brussels, Belgium, February 1979, Accepted for publication in the
Audio Engineering Society
James A. Moorer, "The Use of Linear Prediction of Speech in Computer Music
Applications". Journal of the Audio Engineering Society, Volume 27, #3,
March, 1979, pp134-140. Preprinted in French in "Journees Des Etudes",
Festival du Son, June, 1979.
James A. Moorer, "About this Reverberation Business", Computer Music
Journal, Volume 3, #2, June 1979, pp13-28
James A. Moorer, "The 4C Machine", with A. Chauveau, C. Abbott, P. Eastty,
and J. Lawson, Computer Music Journal, Volume 3, #3, September 1979,
pp16-24
James A. Moorer, "HELP!", Letter to the Editor, Computer Music Journal,
Volume 3, #3, September 1979, p4
IN PREPARATION
John M. Grey, "Perceptual Continuity of Interpolations Between Musical
Timbres", for the Journal of the Acoustical Society of America
John M. Grey, "Multidimensional Scaling of Interpolated Music Instrument
Tones", for the Journal of the Acoustical Society of America